Skip to content

mtmd: be able to use alternative types for the K*Q multiplication #1567

Merged
ikawrakow merged 4 commits intomainfrom
ik/mtmd_kq_type
Apr 2, 2026
Merged

mtmd: be able to use alternative types for the K*Q multiplication #1567
ikawrakow merged 4 commits intomainfrom
ik/mtmd_kq_type

Conversation

@ikawrakow
Copy link
Copy Markdown
Owner

I thought I should give some attention to the multi-modality stuff. The initial idea was that I would enable flash attention (FA). But that turned out to be too a big change as multi-modality models like to use strange attention head sizes. While looking into this I noticed that a very large fraction of the time needed to encode the image is spent in the K*Q matrix multiplication. So, I decided to see if that could be made somewhat faster.

When not using FA the K*Q matrix multiplication is done using 32-bit floats. An obvious thing to try is to see if down casting to f16/bf16, or perhaps even to Q8_0 would bring some performance benefit. Hence, this PR adds the ability to define the type used for the K*Q matrix multiplication via a command line argument

--mtmd-kq-type type

Somewhat surprisingly, I only see a performance improvement when running CPU-only on a Zen4 CPU (Ryzen-7950X) and using --mtmd-kq-type bf16. In that case, for a 1 MiB image, which generates 4015 image tokens, encoding time is reduced to 65 seconds from 76 seconds (I thought that was much too long, so tested the same image with today's llama.cpp. It needed ~300 seconds to encode the same image on the same CPU).

I also played with converting to Q8_0. That seems to work just fine (in terms of the generated response), but does not give a performance benefit. I guess, part of the issue is that the Qwen3 vision encoder has a head size of 72, so to use Q8_0 one must pad K and Q to a row size of 96, which a) takes time and b) makes the matrix multiplication 78% larger.

@vikcious
Copy link
Copy Markdown

vikcious commented Apr 1, 2026

... and congrats for 1000 PRs closed! I guess that these DO matter, prove your journey right and not the Git stars! 🙏🥃

@ikawrakow ikawrakow merged commit 73742c5 into main Apr 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants